Search CORE

22 research outputs found

Designing Reusable Systems that Can Handle Change - Description-Driven Systems : Revisiting Object-Oriented Principles

Author: Branson Andrew
McClatchey Richard
Shamdasani Jetendr
Publication venue
Publication date: 01/01/2014
Field of study

In the age of the Cloud and so-called Big Data systems must be increasingly flexible, reconfigurable and adaptable to change in addition to being developed rapidly. As a consequence, designing systems to cater for evolution is becoming critical to their success. To be able to cope with change, systems must have the capability of reuse and the ability to adapt as and when necessary to changes in requirements. Allowing systems to be self-describing is one way to facilitate this. To address the issues of reuse in designing evolvable systems, this paper proposes a so-called description-driven approach to systems design. This approach enables new versions of data structures and processes to be created alongside the old, thereby providing a history of changes to the underlying data models and enabling the capture of provenance data. The efficacy of the description-driven approach is exemplified by the CRISTAL project. CRISTAL is based on description-driven design principles; it uses versions of stored descriptions to define various versions of data which can be stored in diverse forms. This paper discusses the need for capturing holistic system description when modelling large-scale distributed systems.Comment: 8 pages, 1 figure and 1 table. Accepted by the 9th Int Conf on the Evaluation of Novel Approaches to Software Engineering (ENASE'14). Lisbon, Portugal. April 201

arXiv.org e-Print Archive

CERN Document Server

A Description Driven Approach for Flexible Metadata Tracking

Author: Branson Andrew
McClatchey Richard
Shamdasani Jetendr
Publication venue
Publication date: 01/01/2014
Field of study

Evolving user requirements presents a considerable software engineering challenge, all the more so in an environment where data will be stored for a very long time, and must remain usable as the system specification evolves around it. Capturing the description of the system addresses this issue since a description-driven approach enables new versions of data structures and processes to be created alongside the old, thereby providing a history of changes to the underlying data models and enabling the capture of provenance data. This description-driven approach is advocated in this paper in which a system called CRISTAL is presented. CRISTAL is based on description-driven principles; it can use previous versions of stored descriptions to define various versions of data which can be stored in various forms. To demonstrate the efficacy of this approach the history of the project at CERN is presented where CRISTAL was used to track data and process definitions and their associated provenance data in the construction of the CMS ECAL detector, how it was applied to handle analysis tracking and data index provenance in the neuGRID and N4U projects, and how it will be matured further in the CRISTAL-ISE project. We believe that the CRISTAL approach could be invaluable in handling the evolution, indexing and tracking of large datasets, and are keen to apply it further in this direction.Comment: 10 pages and 3 figures. arXiv admin note: text overlap with arXiv:1402.5753, arXiv:1402.576

arXiv.org e-Print Archive

CERN Document Server

Towards Provenance and Traceability in CRISTAL for HEP

Author: Branson Andrew
McClatchey Richard
Shamdasani Jetendr
Publication venue: 'IOP Publishing'
Publication date: 24/02/2014
Field of study

This paper discusses the CRISTAL object lifecycle management system and its use in provenance data management and the traceability of system events. This software was initially used to capture the construction and calibration of the CMS ECAL detector at CERN for later use by physicists in their data analysis. Some further uses of CRISTAL in different projects (CMS, neuGRID and N4U) are presented as examples of its flexible data model. From these examples, applications are drawn for the High Energy Physics domain and some initial ideas for its use in data preservation HEP are outlined in detail in this paper. Currently investigations are underway to gauge the feasibility of using the N4U Analysis Service or a derivative of it to address the requirements of data and analysis logging and provenance capture within the HEP long term data analysis environment.Comment: 5 pages and 1 figure. 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP13). 14-18th October 2013. Amsterdam, Netherlands. To appear in Journal of Physics Conference Serie

arXiv.org e-Print Archive

CERN Document Server

Scientific Workflow Repeatability through Cloud-Aware Provenance

Author: Hasham Khawar
McClatchey Richard
Munir Kamran
Shamdasani Jetendr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/02/2015
Field of study

The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in comparison to the Grid makes it difficult because resources are provisioned on-demand unlike the Grid. This paper presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud. The repeatability of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, and (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them. The evaluation of an initial prototype suggests that the proposed approach is feasible and can be investigated further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability 2014 workshop of the 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2014). London December 201

arXiv.org e-Print Archive

Crossref

Designing Traceability into Big Data Systems

Author: Branson Andrew
Consortium the CRISTAL-ISE
Kovacs Zsolt
McClatchey Richard
Shamdasani Jetendr
Publication venue
Publication date: 01/01/2015
Field of study

Providing an appropriate level of accessibility and traceability to data or process elements (so-called Items) in large volumes of data, often Cloud-resident, is an essential requirement in the Big Data era. Enterprise-wide data systems need to be designed from the outset to support usage of such Items across the spectrum of business use rather than from any specific application view. The design philosophy advocated in this paper is to drive the design process using a so-called description-driven approach which enriches models with meta-data and description and focuses the design process on Item re-use, thereby promoting traceability. Details are given of the description-driven design of big data systems at CERN, in health informatics and in business process management. Evidence is presented that the approach leads to design simplicity and consequent ease of management thanks to loose typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore July 2015. arXiv admin note: text overlap with arXiv:1402.5764, arXiv:1402.575

arXiv.org e-Print Archive

CERN Document Server

The Requirements for Ontologies in Medical Data Integration: A Case Study

Author: Anjum Ashiq
Bloodsworth Peter
Branson Andrew
Hauer Tamas
McClatchey Richard
Munir Kamran
Rogulin Dmitry
Shamdasani Jetendr
Publication venue
Publication date: 05/07/2007
Field of study

Evidence-based medicine is critically dependent on three sources of information: a medical knowledge base, the patients medical record and knowledge of available resources, including where appropriate, clinical protocols. Patient data is often scattered in a variety of databases and may, in a distributed model, be held across several disparate repositories. Consequently addressing the needs of an evidence-based medicine community presents issues of biomedical data integration, clinical interpretation and knowledge management. This paper outlines how the Health-e-Child project has approached the challenge of requirements specification for (bio-) medical data integration, from the level of cellular data, through disease to that of patient and population. The approach is illuminated through the requirements elicitation and analysis of Juvenile Idiopathic Arthritis (JIA), one of three diseases being studied in the EC-funded Health-e-Child project.Comment: 6 pages, 1 figure. Presented at the 11th International Database Engineering & Applications Symposium (Ideas2007). Banff, Canada September 200

arXiv.org e-Print Archive

Crossref

UWE Bristol Research Repository

Analysis traceability and provenance for HEP

Author: Andrew Branson
Hoekstra Rinke
Jetendr Shamdasani
McClatchey Richard
McClatchey Richard
Ram Sudha
Richard McClatchey
Shamdasani Jetendr
Wolstencroft Katherine
Zsolt Kovács
Publication venue: 'IOP Publishing'
Publication date: 01/01/2015
Field of study

This paper presents the use of the CRISTAL software in the N4U project. CRISTAL was used to create a set of provenance aware analysis tools for the Neuroscience domain. This paper advocates that the approach taken in N4U to build the analysis suite is sufficiently generic to be able to be applied to the HEP domain. A mapping to the PROV model for provenance interoperability is also presented and how this can be applied to the HEP domain for the interoperability of HEP analyses

arXiv.org e-Print Archive

Crossref

UWE Bristol Research Repository

Provision of an integrated data analysis platform for computational neuroscience experiments

Author: Branson Andrew
Hasham Khawar
Kiani Saad Liaquat
McClatchey Richard
Munir Kamran
Shamdasani Jetendr
Publication venue: 'Emerald'
Publication date: 01/01/2014
Field of study

© Emerald Group Publishing Limited. Purpose – The purpose of this paper is to provide an integrated analysis base to facilitate computational neuroscience experiments, following a user-led approach to provide access to the integrated neuroscience data and to enable the analyses demanded by the biomedical research community. Design/methodology/approach – The design and development of the N4U analysis base and related information services addresses the existing research and practical challenges by offering an integrated medical data analysis environment with the necessary building blocks for neuroscientists to optimally exploit neuroscience workflows, large image data sets and algorithms to conduct analyses. Findings – The provision of an integrated e-science environment of computational neuroimaging can enhance the prospects, speed and utility of the data analysis process for neurodegenerative diseases. Originality/value – The N4U analysis base enables conducting biomedical data analyses by indexing and interlinking the neuroimaging and clinical study data sets stored on the grid infrastructure, algorithms and scientific workflow definitions along with their associated provenance information

UWE Bristol Research Repository